Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms

نویسنده

  • David B. Skalak
چکیده

With the goal of reducing computational costs without sacrificing accuracy, we describe two algorithms to find sets of prototypes for nearest neighbor classification. Here, the term “prototypes” refers to the reference instances used in a nearest neighbor computation — the instances with respect to which similarity is assessed in order to assign a class to a new data item. Both algorithms rely on stochastic techniques to search the space of sets of prototypes and are simple to implement. The first is a Monte Carlo sampling algorithm; the second applies random mutation hill climbing. On four datasets we show that only three or four prototypes sufficed to give predictive accuracy equal or superior to a basic nearest neighbor algorithm whose run-time storage costs were approximately 10 to 200 times greater. We briefly investigate how random mutation hill climbing may be applied to select features and prototypes simultaneously. Finally, we explain the performance of the sampling algorithm on these datasets in terms of a statistical measure of the extent of clustering displayed by the target classes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Subset Selection as Search with Probabilistic Estimates

Irrelevant features and weakly relevant features may reduce the comprehensibility and accuracy of concepts induced by supervised learning algorithms. We formulate the search for a feature subset as an abstract search problem with probabilistic estimates. Searching a space using an evaluation function that is a random variable requires trading off accuracy of estimates for increased state explor...

متن کامل

Subset Selection as Search with Probabilistic Estimates

Irrelevant features and weakly relevant features may reduce the comprehensibility and accuracy of concepts induced by supervised learning algorithms. We formulate the search for a feature subset as an abstract search problem with probabilistic estimates. Searching a space using an evaluation function that is a random variable requires trading o accuracy of estimates for increased state explorat...

متن کامل

Sign Language Classification Using Search Algorithms

We concentrate on the problem of feature selection on sign language classification. The goal is to maximize the classification accuracy using a proper subset of features out of totally 22 given features. In this work, we employ two search methods: Hill Climbing approach and Random Walk approach, to select the features. We claim that both algorithms are easy-implemented but reasonable and effici...

متن کامل

Induction Algorithm Induction Algorithm

Irrelevant features and weakly relevant features may reduce the comprehensibility and accuracy of concepts induced by supervised learning algorithms. We formulate the search for a feature subset as an abstract search problem with probabilistic estimates. Searching a space using an evaluation function that is a random variable requires trading oo accuracy of estimates for increased state explora...

متن کامل

Comparison of Genetic and Hill Climbing Algorithms to Improve an Artificial Neural Networks Model for Water Consumption Prediction

No unique method has been so far specified for determining the number of neurons in hidden layers of Multi-Layer Perceptron (MLP) neural networks used for prediction. The present research is intended to optimize the number of neurons using two meta-heuristic procedures namely genetic and hill climbing algorithms. The data used in the present research for prediction are consumption data of water...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994